{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 泰坦尼克生存者预测报告"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 一、问题描述"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "泰坦尼克号沉没是历史上最臭名昭著的沉船之一。 1912 年 4 月 15 日，在她的处女航期间，泰坦尼克号在与冰山相撞后沉没，2224 名乘客和船员中有 1502 人遇难。 这一耸人听闻的悲剧震惊了国际社会，并导致了更好的船舶安全法规。\n",
    "\n",
    "沉船造成如此大的生命损失的原因之一是没有足够的救生艇供乘客和船员使用。 尽管在沉没中幸存下来有一些运气因素，但某些人群比其他人更有可能幸存下来，例如妇女、儿童和上层阶级。\n",
    "\n",
    "在这个挑战中，我们要求您完成对哪些人可能幸存下来的分析。 我们特别要求您应用机器学习工具来预测哪些乘客在悲剧中幸存下来。 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二、报告目的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过机器学习算法，利用所给的训练集数据，完成数据特征的分析构建，选择一种计算模型，对训练数据进行学习，得到一个较好的拟合结果，然后在对于测试数据进行预测比对得到准确度。实质学习如何是通过大量数据进行整合和相关性分析，并进行预测的机器学习方法：在完成代码的同时练习写作与论文水平。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 三、基础知识"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.pandas应用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " pandas 是基于NumPy 的一种工具，该工具是为了解决数据分析任务而创建的。Pandas 纳入了大量库和一些标准的数据模型，提供了高效地操作大型数据集所需的工具。pandas提供了大量能使我们快速便捷地处理数据的函数和方法。你很快就会发现，它是使Python成为强大而高效的数据分析环境的重要因素之一。\n",
    "\n",
    "数据类型包括三种：\n",
    "Series：一维数组，与Numpy中的一维array类似。二者与Python基本的数据结构List也很相近，其区别是：List中的元素可以是不同的数据类型，而Array和Series中则只允许存储相同的数据类型，这样可以更有效的使用内存，提高运算效率。\n",
    "DataFrame：二维的表格型数据结构。很多功能与R中的data.frame类似。可以将DataFrame理解为Series的容器。以下的内容主要以DataFrame为主。\n",
    "Panel ：三维的数组，可以理解为DataFrame的容器。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次训练主要用到了文件读取功能，pd.read_csv（），这个指令可以读取csv/txt/tsv文件，返回一个DataFrame类型的对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 234,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 234,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train=pd.read_csv(\"train.csv\")\n",
    "type(train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame对象可以用.head()来读取前五行，也可以在括号内自定义显示行数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 235,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 235,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  "
      ]
     },
     "execution_count": 237,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "也可用.info（）来进行数据表基本信息（维度、列名称、数据格式、所占空间等)的统计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  891 non-null    int64  \n",
      " 1   Survived     891 non-null    int64  \n",
      " 2   Pclass       891 non-null    int64  \n",
      " 3   Name         891 non-null    object \n",
      " 4   Sex          891 non-null    object \n",
      " 5   Age          714 non-null    float64\n",
      " 6   SibSp        891 non-null    int64  \n",
      " 7   Parch        891 non-null    int64  \n",
      " 8   Ticket       891 non-null    object \n",
      " 9   Fare         891 non-null    float64\n",
      " 10  Cabin        204 non-null    object \n",
      " 11  Embarked     889 non-null    object \n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.7+ KB\n"
     ]
    }
   ],
   "source": [
    "train.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在对于特征数据进行分析整理的时候，我们还用到了.map()与cut函数，对数据进行分类。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 随机森林算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随机森林其实就是多棵决策树。\n",
    "通过对样本重新采样的方法得到不同的训练样本集，在这些新的训练样本集上分别训练学习器，最终合并每一个学习器的结果，作为最终的学习结果，其中，每个样本的权重是一样的.具体过程如下：\n",
    "![算法示意图](./原理图.png)\n",
    "算法流程：\n",
    "输入：样本集D={(x,y1),(x2,y2),…(xm,ym)}，弱分类器迭代次数T。\n",
    "输出：最终的强分类器f(x)\n",
    "过程：\n",
    "   1）对于t=1,2…,T:\n",
    "    a)对训练集进行第t次随机采样，共采集m次，得到包含m个样本的采样集Dt\n",
    "    b)用采样集Dt训练第i个决策树模型ψi，在训练决策树模型的节点的时候， 在节点上所有的样本特征中选择一部分样本特征， 在这些随机选择的部分样本特征中选择一个最优的特征来做决策树的左右子树划分\n",
    "   2)如果是分类算法预测，则T个弱学习器投票得出类别或者类别之一为最终类别。如果是回归算法，b个弱学习器得到的回归结果进行算术平均得到的值$$f(x)=\\frac{1}{b}\\sum_{j=1}^{n}\\varphi_j(x)$$为最终的模型输出。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 四、实验原理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 实验主要分为一下六个部分完成\n",
    "### 一、问题定义"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先是定义问题，该实验主要是利用训练集给的乘客的特征信息，去分析特征与目标Survived之间的决策关系，构建好模型后，对于测试集的乘客进行预测。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 二、可用数据的收集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实验的数据包里给了三个组数据，分别是训练集、测试集，和验证集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 三、数据处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1、先观察数据的所包含的内容，然后找出缺失数据进行合理补全\n",
    "2、进行数据分析，选取可用数据，对无用数据进行清除\n",
    "3、将数据线性化，处理为算法可以用的数字形式\n",
    "4、根据具体情况，依据已有特征构建新的特征"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 四、探索分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过分析找到最适合的机器学习算法，包括线性回归、神经网络、决策树或者是随机森林。本次采用的是随机森林算法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 五、建立模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "引用sklearn中的随机森林RandomForestClassifier函数，然后利用GridSearchCV网格分析进行最佳参数的确定，然后利用得到的参数对所选特征进行训练，观察训练精度，选择最好的特征集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 六、预测结果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将测试集数据输入predict函数中，得到测试集的“Survived”的结果，与所给数据进行吻合性对比，得到预测精度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 五、实验过程"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 导入所需库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import sklearn\n",
    "import random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入sklearn的功能库\n",
    "from sklearn import ensemble \n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "from sklearn import feature_selection\n",
    "from sklearn import model_selection\n",
    "from sklearn import metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入画图所需库\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 首先调取数据，分析数据特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "#调取数据\n",
    "train=pd.read_csv(\"train.csv\")\n",
    "test=pd.read_csv(\"test.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891</td>\n",
       "      <td>891</td>\n",
       "      <td>714.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>204</td>\n",
       "      <td>889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>891</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>681</td>\n",
       "      <td>NaN</td>\n",
       "      <td>147</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Rice, Master. George Hugh</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CA. 2343</td>\n",
       "      <td>NaN</td>\n",
       "      <td>G6</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>577</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>644</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>446.000000</td>\n",
       "      <td>0.383838</td>\n",
       "      <td>2.308642</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>29.699118</td>\n",
       "      <td>0.523008</td>\n",
       "      <td>0.381594</td>\n",
       "      <td>NaN</td>\n",
       "      <td>32.204208</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>257.353842</td>\n",
       "      <td>0.486592</td>\n",
       "      <td>0.836071</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.526497</td>\n",
       "      <td>1.102743</td>\n",
       "      <td>0.806057</td>\n",
       "      <td>NaN</td>\n",
       "      <td>49.693429</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.420000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>223.500000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.125000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.910400</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>446.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>28.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.454200</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>668.500000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>38.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>31.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>891.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>80.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>512.329200</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        PassengerId    Survived      Pclass                       Name   Sex  \\\n",
       "count    891.000000  891.000000  891.000000                        891   891   \n",
       "unique          NaN         NaN         NaN                        891     2   \n",
       "top             NaN         NaN         NaN  Rice, Master. George Hugh  male   \n",
       "freq            NaN         NaN         NaN                          1   577   \n",
       "mean     446.000000    0.383838    2.308642                        NaN   NaN   \n",
       "std      257.353842    0.486592    0.836071                        NaN   NaN   \n",
       "min        1.000000    0.000000    1.000000                        NaN   NaN   \n",
       "25%      223.500000    0.000000    2.000000                        NaN   NaN   \n",
       "50%      446.000000    0.000000    3.000000                        NaN   NaN   \n",
       "75%      668.500000    1.000000    3.000000                        NaN   NaN   \n",
       "max      891.000000    1.000000    3.000000                        NaN   NaN   \n",
       "\n",
       "               Age       SibSp       Parch    Ticket        Fare Cabin  \\\n",
       "count   714.000000  891.000000  891.000000       891  891.000000   204   \n",
       "unique         NaN         NaN         NaN       681         NaN   147   \n",
       "top            NaN         NaN         NaN  CA. 2343         NaN    G6   \n",
       "freq           NaN         NaN         NaN         7         NaN     4   \n",
       "mean     29.699118    0.523008    0.381594       NaN   32.204208   NaN   \n",
       "std      14.526497    1.102743    0.806057       NaN   49.693429   NaN   \n",
       "min       0.420000    0.000000    0.000000       NaN    0.000000   NaN   \n",
       "25%      20.125000    0.000000    0.000000       NaN    7.910400   NaN   \n",
       "50%      28.000000    0.000000    0.000000       NaN   14.454200   NaN   \n",
       "75%      38.000000    1.000000    0.000000       NaN   31.000000   NaN   \n",
       "max      80.000000    8.000000    6.000000       NaN  512.329200   NaN   \n",
       "\n",
       "       Embarked  \n",
       "count       889  \n",
       "unique        3  \n",
       "top           S  \n",
       "freq        644  \n",
       "mean        NaN  \n",
       "std         NaN  \n",
       "min         NaN  \n",
       "25%         NaN  \n",
       "50%         NaN  \n",
       "75%         NaN  \n",
       "max         NaN  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#统计训练集数据特征\n",
    "train.describe(include='all')\n",
    "#发现age等数据有缺失的，为了实现算法必须补齐\n",
    "#cabin数据缺失过多舍弃"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>418.000000</td>\n",
       "      <td>418.000000</td>\n",
       "      <td>418</td>\n",
       "      <td>418</td>\n",
       "      <td>332.000000</td>\n",
       "      <td>418.000000</td>\n",
       "      <td>418.000000</td>\n",
       "      <td>418</td>\n",
       "      <td>417.000000</td>\n",
       "      <td>91</td>\n",
       "      <td>418</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>418</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>363</td>\n",
       "      <td>NaN</td>\n",
       "      <td>76</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Brandeis, Mr. Emil</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>PC 17608</td>\n",
       "      <td>NaN</td>\n",
       "      <td>B57 B59 B63 B66</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>266</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>270</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>1100.500000</td>\n",
       "      <td>2.265550</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>30.272590</td>\n",
       "      <td>0.447368</td>\n",
       "      <td>0.392344</td>\n",
       "      <td>NaN</td>\n",
       "      <td>35.627188</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>120.810458</td>\n",
       "      <td>0.841838</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.181209</td>\n",
       "      <td>0.896760</td>\n",
       "      <td>0.981429</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.907576</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>892.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.170000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>996.250000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>21.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.895800</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1100.500000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>27.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.454200</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>1204.750000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>39.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>31.500000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1309.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>76.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>512.329200</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        PassengerId      Pclass                Name   Sex         Age  \\\n",
       "count    418.000000  418.000000                 418   418  332.000000   \n",
       "unique          NaN         NaN                 418     2         NaN   \n",
       "top             NaN         NaN  Brandeis, Mr. Emil  male         NaN   \n",
       "freq            NaN         NaN                   1   266         NaN   \n",
       "mean    1100.500000    2.265550                 NaN   NaN   30.272590   \n",
       "std      120.810458    0.841838                 NaN   NaN   14.181209   \n",
       "min      892.000000    1.000000                 NaN   NaN    0.170000   \n",
       "25%      996.250000    1.000000                 NaN   NaN   21.000000   \n",
       "50%     1100.500000    3.000000                 NaN   NaN   27.000000   \n",
       "75%     1204.750000    3.000000                 NaN   NaN   39.000000   \n",
       "max     1309.000000    3.000000                 NaN   NaN   76.000000   \n",
       "\n",
       "             SibSp       Parch    Ticket        Fare            Cabin Embarked  \n",
       "count   418.000000  418.000000       418  417.000000               91      418  \n",
       "unique         NaN         NaN       363         NaN               76        3  \n",
       "top            NaN         NaN  PC 17608         NaN  B57 B59 B63 B66        S  \n",
       "freq           NaN         NaN         5         NaN                3      270  \n",
       "mean      0.447368    0.392344       NaN   35.627188              NaN      NaN  \n",
       "std       0.896760    0.981429       NaN   55.907576              NaN      NaN  \n",
       "min       0.000000    0.000000       NaN    0.000000              NaN      NaN  \n",
       "25%       0.000000    0.000000       NaN    7.895800              NaN      NaN  \n",
       "50%       0.000000    0.000000       NaN   14.454200              NaN      NaN  \n",
       "75%       1.000000    0.000000       NaN   31.500000              NaN      NaN  \n",
       "max       8.000000    9.000000       NaN  512.329200              NaN      NaN  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#测试集分析\n",
    "test.describe(include='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 232,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>...</th>\n",
       "      <th>Age_</th>\n",
       "      <th>age</th>\n",
       "      <th>SibSp_</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>Fare_</th>\n",
       "      <th>fare</th>\n",
       "      <th>Cabin_</th>\n",
       "      <th>cabin</th>\n",
       "      <th>Survived</th>\n",
       "      <th>survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>892</td>\n",
       "      <td>3</td>\n",
       "      <td>Kelly, Mr. James</td>\n",
       "      <td>male</td>\n",
       "      <td>34.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>330911</td>\n",
       "      <td>7.8292</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>(19.0, 38.0]</td>\n",
       "      <td>1</td>\n",
       "      <td>m</td>\n",
       "      <td>1</td>\n",
       "      <td>(-0.512, 128.0]</td>\n",
       "      <td>0</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>893</td>\n",
       "      <td>3</td>\n",
       "      <td>Wilkes, Mrs. James (Ellen Needs)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>363272</td>\n",
       "      <td>7.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>(38.0, 57.0]</td>\n",
       "      <td>2</td>\n",
       "      <td>m</td>\n",
       "      <td>1</td>\n",
       "      <td>(-0.512, 128.0]</td>\n",
       "      <td>0</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>894</td>\n",
       "      <td>2</td>\n",
       "      <td>Myles, Mr. Thomas Francis</td>\n",
       "      <td>male</td>\n",
       "      <td>62.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>240276</td>\n",
       "      <td>9.6875</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>(57.0, 76.0]</td>\n",
       "      <td>3</td>\n",
       "      <td>s</td>\n",
       "      <td>2</td>\n",
       "      <td>(-0.512, 128.0]</td>\n",
       "      <td>0</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>895</td>\n",
       "      <td>3</td>\n",
       "      <td>Wirz, Mr. Albert</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>315154</td>\n",
       "      <td>8.6625</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>(19.0, 38.0]</td>\n",
       "      <td>1</td>\n",
       "      <td>m</td>\n",
       "      <td>1</td>\n",
       "      <td>(-0.512, 128.0]</td>\n",
       "      <td>0</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>896</td>\n",
       "      <td>3</td>\n",
       "      <td>Hirvonen, Mrs. Alexander (Helga E Lindqvist)</td>\n",
       "      <td>female</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>3101298</td>\n",
       "      <td>12.2875</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>(19.0, 38.0]</td>\n",
       "      <td>1</td>\n",
       "      <td>s</td>\n",
       "      <td>2</td>\n",
       "      <td>(-0.512, 128.0]</td>\n",
       "      <td>0</td>\n",
       "      <td>no</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Pclass                                          Name     Sex  \\\n",
       "0          892       3                              Kelly, Mr. James    male   \n",
       "1          893       3              Wilkes, Mrs. James (Ellen Needs)  female   \n",
       "2          894       2                     Myles, Mr. Thomas Francis    male   \n",
       "3          895       3                              Wirz, Mr. Albert    male   \n",
       "4          896       3  Hirvonen, Mrs. Alexander (Helga E Lindqvist)  female   \n",
       "\n",
       "    Age  SibSp  Parch   Ticket     Fare Cabin  ...          Age_  age  SibSp_  \\\n",
       "0  34.5      0      0   330911   7.8292   NaN  ...  (19.0, 38.0]    1       m   \n",
       "1  47.0      1      0   363272   7.0000   NaN  ...  (38.0, 57.0]    2       m   \n",
       "2  62.0      0      0   240276   9.6875   NaN  ...  (57.0, 76.0]    3       s   \n",
       "3  27.0      0      0   315154   8.6625   NaN  ...  (19.0, 38.0]    1       m   \n",
       "4  22.0      1      1  3101298  12.2875   NaN  ...  (19.0, 38.0]    1       s   \n",
       "\n",
       "  sibsp            Fare_ fare  Cabin_ cabin  Survived survived  \n",
       "0     1  (-0.512, 128.0]    0      no     0         0        0  \n",
       "1     1  (-0.512, 128.0]    0      no     0         0        0  \n",
       "2     2  (-0.512, 128.0]    0      no     0         0        0  \n",
       "3     1  (-0.512, 128.0]    0      no     0         0        0  \n",
       "4     2  (-0.512, 128.0]    0      no     0         0        0  \n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 232,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 缺失值填充"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "train['Age']=train['Age'].fillna(train['Age'].median())#用中年龄位数\n",
    "test['Age']=test['Age'].fillna(test['Age'].median())\n",
    "test['Fare']=test['Fare'].fillna(test['Fare'].median())\n",
    "train['Embarked']=train['Embarked'].fillna('S')#用出现最多的进行填充"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "data=[train,test]#总数据库"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据特征构建"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "#将性别等数据线性化\n",
    "label=LabelEncoder()\n",
    "for a in data:\n",
    "    a['sex']=label.fit_transform(a['Sex'])#新建性别特征sex\n",
    "    a['embarked']=label.fit_transform(a['Embarked'])#新建登陆地点特征embarked"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAA6FUlEQVR4nO3deXxU1f3/8deZPfseCEtYEwQVsOxCRYv6VXGrQhU36kZdq7bWWrvaqvX3+/WrtdWquCuurSIIilJ2RSDIviSEHRKyQvZk1vP7IwmyE8LM3Mncz/Px4DGZmTv3fhIm7zk599xzlNYaIYQQ5mExugAhhBDhJcEvhBAmI8EvhBAmI8EvhBAmI8EvhBAmYzO6gLZIT0/XPXv2NLoMIYToUL777rsKrXXGkY93iODv2bMnK1euNLoMIYToUJRSu471uHT1CCGEyUjwCyGEyUjwCyGEyUjwCyGEyUjwCyGEyUjwCyGEyUjwCyGEyUjwCyEihkwTHx4S/EKIiLBjxw6uvPJKVq9ebXQpUU+CXwgREdatW0dtbS3z5883upSoJ8EvhBAmI8EvhBAmI8EvhIgISimjSzANCX4hRESQET3hI8EvhIgI0uIPHwl+IYQwGQl+IUREkK6e8JHgF0JEBL/fb3QJpiHBL4SICF6v1+gSTEOCXwgREST4w0eCXwgREVqDX/r6Q0+CXwgREVr7+KWvP/Qk+IUQEcHn8x12K0InpMGvlHpIKbVRKbVBKfW+UsqllEpVSs1VShW23KaEsgZxuKVLl/LGG28YXYYQRwkEAofditAJWfArpboCPweGaq3PAqzA9cCjwDytdQ4wr+W+CJPHHnuMt956y+gyhDhKaxePBH/ohbqrxwbEKKVsQCxQDFwFtCbPW8DVIa5BCNEBtJ7UlZO7oRey4NdaFwF/A3YD+4BqrfVXQCet9b6WbfYBmcd6vVJqilJqpVJqZXl5eajKNC3pRxWRSoI/9ELZ1ZNCc+u+F9AFiFNK3dTW12utp2qth2qth2ZkZISqTNPyeDxGlyDEYVonaZPJ2kIvlF09FwI7tNblWmsv8AlwLlCqlMoCaLktC2EN4jgaGxuNLkGIw0jwh08og383MFIpFaua/yfHAZuBmcDklm0mAzNCWIM4jvr6eqNLEOIwFktzHFmtVoMriX62UO1Ya71cKfUfYBXgA1YDU4F44COl1O00fzhMDFUN4vhqa2uNLkGIw7QGfusHgAidkAU/gNb6j8Afj3jYTXPrX4TZoVdE1tTUGFiJEEdrDX6bLaSxJJArd03l0Fb+gQMHDKxEiKO1Br509YSeBL+JVFZWHvx6//79BlYixNHsdrvRJZiGBL+JHHo9hFwbISKNtPTDR4LfREpLSwHQ9hhKSkoMrkaIw0nffvhI8JtISUkJWKz44jIoKt5ndDlCHKY1+OXK3dCT4DeRoqIicCagnYmUlpTIZFgiosgFXOEjwW8iRUXF+BzxBFwJeL2ew072CmG01sCXFn/oSfCbhNaaoqIiAs5EAs4EAIqLiw2uSojvSUs/fCT4TaK6upqmpkYCzgQCzkRAgl9EFmnph48Ev0ns29d8MjfgSkA74kEpCX4RUVqDX1r+oSfBbxKtwa8dCWCxoJzxMqRTRJTWwQbS8g89CX6TaA35gDMOAJ89ToJfRBS32w1Iiz8cJPhNory8HGVzgtUBQMAeR2mZLIUgIkdr8EuLP/Qk+E2ivLwc7Yg9eD/giKWyslJ+yUTEaF0joqmpyeBKop8Ev0lUVFTis8UcvK/tsfh9Pqqrqw2sSojvtc4eK1OGh54Ev0lUVFSg7YcEf0vrX2bpFJGidarw/fvlwsJQk+A3Aa01B6oOELB/39XT+iEgwS8iRUVF84yxlRUyc2yoSfCbQG1tLX6fD31I8Ld+CFRUVBhVlhCHKW0ZcnygqubgiV4RGhL8JtAa7tpxaB9/89cyX4+IBLW1tdTWN9A93gcgQ41DTILfBFoXXdH2uO8ftNpRdqcsyCIiwu7duwE4J91z2H0RGhL8JlDWMl4/4Ig77PGAPe7gc0IYadeuXQAMzWgO/p07dxpYTfST4DeBkpISUJbDxvED+B1xsiCLiAhbt27FZVNkx/vJjG2+L0JHgt8EiouLwRkP6vD/7oAzgX37iuUiLmG4goJ8usd5sSjoGe+mIH+z0SVFNQl+E9i9ew++ljn4DxVwJuJxu+UErzCU1+ulcEshfRK9APRO9FFSWnZwXL8IPgn+KKe1Zm/RXgLOpKOeC7iaH9uzZ0+4yxLioG3btuHxeumb1Dyip09i8+3GjRuNLCuqSfBHufLyctxNTQRiko96rvUxGUEhjLRu3ToAcpOaW/y9EnzYLLB+/Xojy4pqEvxRrnW0RGvr/lDaHouy2g9uI4QR1q5dS2asJtnZfK7JYW0O/7Vr1xhbWBST4I9yB4P/GC1+lMLvSpLgF4YJBAKsW7OG/kmHX6nbP9nDli2FNDQ0GFRZdJPgj3K7d+9G2Z1om+uYz/tdSezYKcEvjLFt2zZq6+s5I9l72ONnpPgIBALS3RMiEvxRbueuXfidSXCcVY0CrmT2V1ZIy0oYYs2aNQD0T/Ed9nhOkherBVavXm1AVdFPgj/K7d69B/8x+vdbtfb97927N1wlCXHQ6tWr6BSrSXUFDnvcaW0e3bN61XcGVRbdQhr8SqlkpdR/lFL5SqnNSqlRSqlUpdRcpVRhy21KKGsws4aGBqoO7CfgTDzuNgFX83MypFOEm9/vZ+2aNfRPPvZMnAOSPRQWbqWuri7MlUW/ULf4nwPmaK3PAAYBm4FHgXla6xxgXst9EQJFRUXA9+F+LK0fCsXFxWGpSYhWhYWF1Dc00v+I/v1W/VO8BLQ+ONxTBE/Igl8plQicB7wGoLX2aK2rgKuAt1o2ewu4OlQ1mF1rmAeOcdXuQVYbyhl38ENCiHBp7b/vn3Ls4O+T6MNulX7+UAhli783UA68oZRarZR6VSkVB3TSWu8DaLnNDGENpravZWGLEwY/4LPHH9xWiHBZvXoVXeK/H79/JIcVchK9rPpuZZgri36hDH4b8APgRa31OUA9p9Cto5SaopRaqZRaKXPGt09JSQnK5gSb84TbBZzxFElXjwgjr9fL2rVrGZB04pW2BqR42bZ9B1VVVeEpzCRCGfx7gb1a6+Ut9/9D8wdBqVIqC6Dl9pgTwmutp2qth2qth2ZkZISwzOhVUlJCwBl/0u0Cjnj2V1bi8/lOuq0QwbB582bcbg8DUo/dzdNqQEs3kHT3BFfIgl9rXQLsUUr1a3loHLAJmAlMbnlsMjAjVDWYXfG+ffjscSfdTjvjCQQCshqXCJu8vDwsiuOe2G3VK8FHrF2xcqV09wSTLcT7vx94VynlALYDt9L8YfORUup2YDcwMcQ1mJLWmtKSUnRyn5NuG3A0/1VQUlJCVlZWqEsTgry8FfRO9BNnP/FaEFYL9E9uYsXyZWitUce5EFGcmpAO59Rar2nprhmotb5aa31Aa12ptR6ntc5pud0fyhrM6sCBA7jdTSc9sQvfn/yVIZ0iHA4cOEBBwRYGpp64f7/VwFQv5RWV7NixI8SVmYdcuRul2jSUs4VuWZ1LhnSKcFi+fDlaawaledq0/aD05u6gb7/9NpRlmYoEf5RqvRL3WNMxH0VZwJUo0zaIsFiyZAmpLuiZ4G/T9qnOAL0S/Xy9ZHGIKzMPCf4otWvXLrBYm1vzbeB1JrJ9x87QFiVMr6GhgZV5K/hBeuNR8wZO2xLLtC2xx3zdkHQ3m/MLKCs75iBAcYok+KPU9u3b0THJRy2wfjyBmBSKi/bidret31WI9vj2229xe7wMzzi6m2d3nY3ddccebzIss/l9uWjRopDWZxYS/FFIa01+QQE+V9vnvwvEphIIBNi+fXsIKxNmN2/ePFJckJt8ateMZMUG6JEQ4L//nRuiysxFgj8KlZaWUlNdjT++7Re++eOat928eXOoyhImV11dzYrlyxmZ0YilHaMyz+3USEHBFplJNggk+KNQ66pF/ri2T4OkHXEoZxwbNmwIVVnC5BYuXIjP72dU5/Z1J47s5EYBX331VXALMyEJ/ii0Zs0alM1JIPYUljpQCk9cJ1atWo3WJ76oRoj2mPPF53SLD9Ajvm2jeY6U4tScmepl7ldfEggETv4CcVwS/FFGa82y5SvwxHdu84ndVr7ELlRVHZB+fhF0e/bsYXN+AWM6Hz2a51SM6dxESWmZrMV7miT4o8yOHTuorCjHn9ztlF/rT2p+zbJly4JdljC5r776CqVgVKfTGzU2JMOD0ybdPadLgj/KLFmyBABfUvdjPu/cvQzn7mMHu3bEEojPYNFiuVBGBI/Wmv9+9SUDUrykHGfu/bZyWmFoehMLF8zH42nblb/iaBL8UWb+ggUEEjqhHce+EMbSsB9Lw/GnR/ImZ7OloEAWZhFBU1BQwL7SMkZlBucakVGdPNQ3NJKXlxeU/ZmRBH8U2bZtG7t27sST0qvd+/Cm9gaax1sLEQwLFy7Eqpq7aYJhQIqXeEfzfkX7SPBHkS+//BKUBV9q+4NfOxMIJHTi8y/myOgeERTffL2E/inek07B3FY2CwxKdbPs26WyeFA7SfBHCa/XyxdzvsSb3B1tjzmtfbnTcigu2isjJ8Rp27t3L3v2FjG4jTNxttU56R5q6+rZtGlTUPdrFhL8UeKbb76htqYab3ruae/Ll9oLZXMwa9asIFQmzGzVqlUAnB3k4D8zxYs6ZP/i1EjwR4kZM2eCMx5/UtfT35nVjjulNwsWLKCmpub09ydMa/Xq1aS4oHNMcC+4irNreib6WbXqu6Du1ywk+KPArl27WL1qFe703FO+aOt4vJn98Hq9zJkzJyj7E+a0Yf06chLdp3XR1vHkJHooyM+Xfv52kOCPAjNnzgSLBW9Gv5Nv3EaB2DQCCZ2Y/umncnm8aJeKigrKKyrpmxSaYO6b5MPt8bJt27aQ7D+aSfB3cE1NTXwxZw7e5B6nfVL3SO6MM9hXXCz9qKJdCgoKAOiTGJrg792y39bjiLaT4O/gFixYQEN9Pd7M/kHfty+lJ8ru4tNPZwR93yL6bdmyBaUgOz40wZ/hChDnUBL87SDB38HNmDkTHZOMP75T8HduseJO68vSpd9QWVkZ/P2LqJafn0/XuABOa2j2rxT0jPNQUJAfmgNEMQn+Dmz79u3kb97cclI3BGfPAE9GPwKBQPPFYUK0kdaags2b6Bkf2vl0eid62bFjhywZeook+DuwOXPmgMWCL61PyI6hXUkEEjoxa/ZsuZJXtFlRURFVNbXkhOjEbqu+ST78/gD5+dLqPxUS/B2Uz+fjq6/m4k3sFvSTukfypOVQXFQkv1yizdasWQOc+tq6p6p1xNDatWtDepxoI8HfQa1Zs4aqqgP40vqG/FjelB4oi5W5c2Wha9E2y5cvJ9UFXWLbt9pWWyXYNb0T/Sz7dmlIjxNtJPg7qPnz56NsDnztWHDllNmceBK7MX/BAvz+0P4ii46vsbGRvBXLGZTaFKpTT4cZlOZmc36BDEA4BRL8HZDP52PRosV4krqDxRaeY6b2ourAAdatWxeW44mOa/HixTS5Pae92lZbjezkRmstq3KdAgn+DmjVqlXU19fhPY3pl0+VL7k7ymJjsazOJU5Aa82MGZ+SGavpF+L+/VZZsQFykn18NnOGTN/QRicNfqVUJ6XUa0qpL1ruD1BK3R760sTxLFq0CGVz4E/sEr6DWu14EruyYOEimcJBHNeaNWvYtGkzl3SrD0s3T6tLujVSvK+ERYsWhe+gHVhbWvxvAl8CrSmzBXgwRPWIk/D5fCxavBhPYrewdfMcPHZqT6oO7GfDhg1hPa7oGLTWvPbqqyQ74bys8I6rH5LhoWt8gDdefw2v1xvWY3dEbQn+dK31R0AAQGvtA+QMn0FWrVpFXW3taa2y1V6+5GyUxcaCBQvCfmwR+b788ks2bNzItb3qcIToat3jsSi4vk8de4uK+eijj8J78A6oLcFfr5RKAzSAUmokUN3WAyilrEqp1UqpWS33U5VSc5VShS23Ke2q3KTmzZvXPJonGPPunyqrHU9SN+bNXyB9qeIwlZWVvPivF+ib5OeHYW7ttxqU5mVIuoe333qTXbt2GVJDR9GW4P8FMBPoo5T6BngbuP8UjvEAsPmQ+48C87TWOcC8lvuiDRobG1m4aBGe5B5h7+Zp5UvtTU11FStXrjTk+CLyBAIBnnzyCZrq67j9jFosYezbP9It/epw4OXPj/9JpnE4gZMGv9Z6FTAWOBf4GXCm1rpNY/qUUt2A8cCrhzx8FfBWy9dvAVefQr2mtnDhQtxNTXjTcwyrwZfcHWWP4fPPPzesBhFZXnvtNVatWs1NObV0jTO2FzjFqbnzjBq2bd/Bs88+K9OMHMdJm41KqWuOeChXKVUNrNdal53k5X8HHgESDnmsk9Z6H4DWep9SKvMU6jUtrTUff/JJ6GbibCuLFXdaH77++mvKysrIzJT/PjObNWsW7777Lud3aQr7Cd3jGZzu5eqeDXw6Zw5du3bl5ptvNrqkiNOWrp7baW6x39jy7xWau3++UUod9yeqlLocKNNat2tRTKXUFKXUSqXUyvLy8vbsIqqsXbuWrYWFuDP7h2wmzrbyZPYnoDUff/yxoXUIYy1atIhnnnmGs1K93JIb3uGbJ/PjXo2c28nNa6+9xowZsp7EkdoS/AGgv9b6Wq31tcAAwA2MAH59gteNBq5USu0EPgB+pJSaBpQqpbIAWm6P+VeD1nqq1nqo1npoRkZGm7+haKS15tXXXkM5Yg3t5jlYjzMBb2pvPpk+XS6TN6lFixbx+OOP0yfBy/1n1WCLsEtBlYI7+tdxTrqHZ599VsL/CG357+qptS495H4ZkKu13g8cd8Cs1vo3WutuWuuewPXAfK31TTSfKJ7cstlkQP5HTmLRokVsWL+exqxBhp3UPZK7yzl4vT5efvllo0sRYTZr1qyDof/woGpiIuMteRSbBe47q/Zg+E+bNk36/Fu0JfiXKKVmKaUmK6Vag3qxUioOqGrHMZ8GLlJKFQIXtdwXx1FVVcUzz/6dQFx6UBdTP13alYi789l89dVXLFu2zOhyRBhorXn11Vf529/+xlkpbh4eVEWMLbKD1G6B+8+q5dxO7oO1y1DktgX/vcAbwOCWfysArbWu11pf0JaDaK0Xaq0vb/m6Ums9Tmud03K7v32lRz+Px8Pvfv97amtraew5GlRk/T3t6TIYHZvKX/7yBHv27DG6HBFCtbW1/O53v2XatGlc0KWJh86uidiW/pFsFvjZgDqu7NnA7Nmz+cUvHjJ9F2VbhnNqYBvN3To/BsZx+Lh8EQJer5e//vWvbFi/noaeYwjEphld0tEsVur7jqPB6+eRR35NWdnJBnmJjqiwsJApd97BsqVLuTGnnp/2q8caWW2Qk1IKJvRu5K4BtRRsXM8dt9/G6tWrjS7LMMf971NK5Sql/qCU2gw8D+wBlNb6Aq3182Gr0IRqa2v51SOPsGDBAtzdhoZ0acXTpZ0J1PUZR0l5BVN+dhdbtmwxuiQRJH6/n/fee4977r4Ld1Upj/2gmv/pHp459kPl3M4e/jikCpe3il/+4he8/PLLeDyhXRc4Ep3oczuf5tb9FVrrMVrrfyJz9ITcunXr+Nldd7NmzVoae/0QT9ZAo0s6qUB8JnX9xlPV4OG++3/Op59+KjN4dnB79uzh/vvuZerUqQxOaeAvQ/eHfP3ccOkW7+dPQ/ZzXlYj77//Pj+bcqfpGizqeGe5lVI/pnk0zrnAHJqHZL6qtQ777GBDhw7V0T5FQF1dHVOnTmXmzJngSqChxxj8iVlBPYZz9zLsFYUA+GPTCMSm4s4eGbT9K08DMTsWY60pZsCAM/nVrx6mV6/wTyYn2s/j8fDBBx/wzjtvY8fHLTm1jOrkCXkrf9qWWJbscwLQI8FPdryPm3IbQntQYG2Fnde2JFLjsTBx4kQmT55MbGxsyI8bLkqp77TWQ496/GTDm1pG71wNTAJ+RPM0C9O11mFb7iaag7+mpobp06fz0b//Q319HZ7MAbi7/gCs9qAfKyb/c2y1JQfv+xI603jGZcE9iNbYKrcSuzcP5fcwbtw4brzxRnr27Bnc44igW7VqFc8+87/s2VvE8Ew3N+bUk+IMz6idp1Ylkl/1/Xv+jGQvj/2gJizHrvMqPtoWy8JiFxnpafz8gQcZM2YMqiP3abVod/AfsZNUYCJwndb6R0Gs74SiMfjLysqYPn06n0yfjrupCV9yNu4ugwnEpYfsmGEJ/hbK24hj3zqcFQVov48xY37IpEnXM2DAgKj4hYomxcXFvPTSSyxevJjMWM0tObUMTAvvnPZGBn+rwmobbxYksKfOwtAhP+Cee++jd+/eYa0h2IIS/EaJluB3u90sWbKEz7/4gtWrVqEBb0ovPFkDCcSmhvz44Qz+VsrbhL1sE66yzWifm+7ds7nssku56KKLSE8P3YecOLn6+nqmTZvGf/79ERbtZ3x2A5dlN4Z9Ln2IjOAH8AVgfpGL6TvjaPQpLr/iCm677TaSk5PDXkswSPAbxOfzsWbNGhYtWsR//zuPxsYGcCXgTu2DNz0H7Uw4+U6CxIjgP8jvxb5/B47KQiy1pSilGD58OOPGjWPUqFEkJITv52B2Pp+Pzz77jLfeeJ2qmlrGdG5iYp+GsHXrHEukBH+rOq/i0x0x/LcoBpfLxY033cyECRNwOp2G1dQexwv+DnIJRsfS0NBAXl4eX3/9Nd8sXUpDfT3KasOT3ANvdi7+hM6GT7QWdlY73oxcvBm5qKZq7BWFLF+zkeXLl2OxWhk8aDA//OEYRo8eLTN+hojWmq+//pqXX3qRvUXFnJHs48GhdfROlMF6R4q3a27KbeCCrm4+2hbLK6+8wqfTP+H2O+7k4osvxmLpYBcyHEFa/EFSVFTEihUrWL58OStXfofP50XZXXiSuuNLzsaX2BWsxn7OGtriPxatsdSXYzuwG2f1bmisAiAnN5dzR41i+PDhnHHGGVitBvQ9RJn8/HxeeP6frN+wkS5xmuv61DI4zRsx7Y9Ia/EfKf+Ajfe3xbOjxkrfvn249977OOecc4wu66SkqyfIGhsbWb16NXl5eSxbtpx9+4qbn3Al4knqhi+5B/6EThE1zULEBf8RLI1V2Kp2Ya/ag6W+HLQmLi6eYcOGMmLECIYNGybnBU5RWVkZr7zyCnPnziXRCdf0rGNsljvirryN9OAHCGhYXurgox0JVDbCueeey1133UV2drbRpR2XdPWcJq0127ZtIy8vj+UrVrB+/Xr8Ph/Kascb3xlf9kh8Sd3QrkSjS+2wAjHJeGKS8WQNAl8TtupivDVFLFqax8KFCwHo2bMXI0YMZ9iwYZx99tkdrs81XHw+H//5z394443XCXg9XNGjgct7NEX8pGqRzKJgVGcPQzIq+XKPi1krlnLb8uVcP2kSN998c4d6L0rwn0BVVRV5eXkHw766qgoAHZuKN+0MfEldm1v1ETJVclSxufCl9caX1psmrbE0HsBWvZdt+4vY+dG/+fDDD7E7HAwePJjhw4YxfPhwsrOzZagosGHDBv73b/+PHTt3cU66h5tz6kmPkSupg8VhhSt6NjG2i5v3t8Yybdo05v13Lg8+9AtGjBhhdHltIl09h9Bas3PnTpYuXcrX33xD/ubNaK2b++oTsvAldcOf2BXt6JhX9kV6V0+b+b1Ya0uwVRfhqC0+eG6gU+fOjBk9mtGjRzNw4EBsNnN9IAcCAd555x3efPMNUl1wU98ahmSEdzx+e3WErp7j2XTAxltbEtlXr7j22mu56667sNuDfwFme0hXz3ForVm3bh2LFy9mydffUFbaHIyBuHS8WYPxJXdvnhlTWpKRw2rHn9wdf3J33IBy12KrLqKoajcfT/+Ujz/+mJjYWEaNHMnolg8Cl8tldNUhVV1dzRNP/IW8vJWc28nNT/vV4TL9b3d4DEjx8cSw/Xy4NZaPP/6YzZs28qfH/xzRo9NM+9aorKzkyy+/5LNZs9hXXAwWK76ELvh6nIsvuTvaEWd0iaKNtDMBb+YZeDPPAL8XW00xnqrdLPj6W+bPn09MbCz/c/HFjB8/npwc45euDLaGhgYeevABdu/ayU/71XFBF7e0U8LMboGbchvITfbxan4+9993Ly++9DKpqaG/MLM9TBf8hYWFvPXW2yxdupRAwI8/oTOeXufhS+kRkvlxRJhZ7fhSeuBL6YFba6y1JXgrtjBj5iw+/fRT+ubkcP111zFu3LioOB/g8/l4/PE/sXPnTn4xsCbsUy2Iww3P9JDuquKp1fDbx37D35/7R0Se9DVN8Pt8PqZNm8bbb7+DtjpoyhyANz0XHZNkdGkiVJTCn5iFPzGLJp8be+U2Cou28MQTTzBv3jwefvhh0tIicIGbUzB37lyWL1/B5Nw6Cf0I0TvRz10DavnH+gL+/e9/c9NNNxld0lEibDRvaLjdbu659z7efPNN3Mk9qTnrGjzdh5kv9P0eXC4XEyZMaO7z9ptoAQqbE2+nAdQNuIqm7sNZtjyPWyb/lMLCQqMrOy1LliwhPQZ+1NVtdCmnpdGnDntvNvo69l9jQzM89E3ys2TJYqNLOSZTBP+6devYUpBPU49zaeozFmyR96dXOCifh8svv5z77ruP8ePHo3wmCv5WSuHtfBZ1A66kobGRL774wuiKTsuG9evol9Tx+/QbfOqw92ZDBw9+gH5JHgoKtuD1Rt5fYqbo6ikoKADAl9jF4EqMpW0OZs2ahdaa2bNno20dc1hqMARcSfjt8WzOzze6lNPSp29fdhd2jGGPJxJr04e9NztFwYVmu+ps9MjuHjFDOw9lihb/8OHDcTicxG2bD76O/SfxabE6aGpq4uOPP6apqQmsDqMrMobWOHd9i6WpirHnnWd0Nadl5MhR7KmzsLYy8sLlVMTY9GHvzY5+hfH2Giubq+yMGDnK6FKOyRTBn5uby1NPPYnNU0PCpk+xl+WDrAlrStaafcQVzMZRns+kSZO47rrrjC7ptFx55ZX06d2LFzclsa/eFL/OEe+AW/HchmQyMjK54YYbjC7nmEzzThk6dCj/eO45zuzbC9eupSRs/ARbxVYIyJS0UU9rLLWlxG75ktiCL0h3+Hn44YeZMmVKhx/SGRMTw5NP/RVHTDx/XZNCQZUpem8j1u46K0+uTqFRO3nqr09H7AIupgl+gDPPPJPnn/8nTz/9NL26pBOzYzGJ6z/CuScP1VRtdHki2Hxu7KWbiN80g7j82SQFarjnnnt4/733uPzyyzt86Lfq3Lkzzz73D+LSsnh6dRJz97roADOxRJ1lpQ7+8l0Kflcq//vMMxG9bKPpmgdKKUaOHMnw4cPJy8vjs89msXTpNzhK1uNPzMKTnosvuYfhc+eLdtIaa10p9opCHAd2oP0++ubkcNWVtzJu3DhiY6PzhHavXr146eWpPPXkk7yzbBlrKx38NLdOJmcLgxqPYtqWOJaVOTn7rAH86fE/R/z1IaZNN4vFwogRIxgxYgSVlZV8/vnnzPzsM8q3L0LZHM2rZaX1NedqWR2QaqrGXrkN5/5t0FSL0+Xi4ssu5fLLL6dfv35GlxcWCQkJPPnUU0yfPp1Xpk7lN3kOJvSq46JuTVjkLRx0WsPXJU7e3xaPO2DlttsmM2nSpIgcxXMk0wb/odLS0rj55pu58cYbWbt2LV999RXzFyzAXVEIzvjv18eVufYji8+Dff92HJVbsdSVoZTinB/8gEv+53/44Q9/SExMjNEVhp3FYuHaa69lzJgxPPPM//Lu8hUsKYnhpr61nJHiM7q8qLGz1sq0wni2VNk4+6wzefhXj9CjRw+jy2ozmZb5OJqamvjmm2/4Ys4cvlu5Eq1187w+aX3xpfbqkPP6RMW0zFpjrSlu7sqp2o0O+Oienc1ll17KhRdeSEZGhtEVRgytNYsWLeJfLzxPWXkFIzLdXN+3gTRX5HX/dJRpmWs8io+3x7Kw2EVSYgJ3TPkZl112WcSuwSvTMp8il8vFuHHjGDduHGVlZcydO5dZsz9n386vUXuW407tjTezP4HYyJx9L9oobyP2ii04K7ZAUy2xcXFcdMV4Lr30Uvr16xc1J2qDSSnF+eefz8iRI3n//fd5/733WL3cyfjsBi7LbsQpSxm3mS8A/93r4tNdcbgDFiZMvJZbbrmFhIQEo0trFwn+NsjMzOTGG2/khhtuYMOGDcyePZt58+bhLS8gkNAJd0b/5tk9LfKbFGyWujIcZZuxH9gJAT+DBw/miiuuYMyYMRE562Ekcrlc3HrrrVx66aW89NJLTF+4kMUlsVzXu5YRmR45hXUSayvtvLc1gX31imHDhnLvvffRs2dPo8s6LdLV007V1dXMmTOHT6ZPp7SkBJzxNGWeiTcjN2K7gZy7l2GvaJ6UzB+bRiA2FXf2SIOrOgatsVbvwVWyHkttKa6YGC695BKuvvrqDtWPGqnWrl3LP//5D7Zu3cYZyT5uya2jW7yx17NEYldPeaOFaYVxrK5w0K1rF+69735GjhzZof66PF5XT8iCXynVHXgb6AwEgKla6+eUUqnAh0BPYCfwE631gRPtKxKDv1UgEGDFihW8+957rF+3DmV30ZTRH0+nARE5GVxM/ucAkdm3rwPY9u/AVbIe1bCf9IwMbpg0iUsuuSRqh2Eaxe/3M3v2bF6Z+jL19fVc3K2RH/dqIMagPoBICn6PHz7fHcNnu2Ox2hxM/umtTJgwoUOM1jmSEX38PuCXWutVSqkE4Dul1Fzgp8A8rfXTSqlHgUeBX4ewjpCyWCyMHDmSkSNHsmHDBt59912+/fZbXGWbaOp8Fp7MM+WagJPRGlvVblzFq1ANB+ie3YObb7qLH/3oR6ZbNzdcrFYrV155JWPHjuWVV15h9uxZrCiP4ae5NQxOj7zZJMOloMrG6wXN6+decMEF3H333RG9hGJ7ha2rRyk1A3i+5d/5Wut9SqksYKHW+oQDrSO5xX8shYWFvPrqayxfvgwcsTRlDcKb0Q+U8Wf+I63Fb63Zh6voOyx1ZXTt1o07br+dsWPHRuwoiWi1ceNG/t///T/s3LWbUZ3c3JhTT6IjfN3ARrf4G32KD7fFMr/IRedOmfzy4V8xbNiwsB0/VMLe1XPEwXsCi4GzgN1a6+RDnjugtU45xmumAFMAsrOzh+zatSvkdQbb+vXrmTr1FdavX4eOSaax2zD8Sd0MvSAsUoJfNVXj2pOHrWo3aekZ3H7brVx88cXSwjeQ1+vl3XffZdo77xBr83NHv/C1/qdtiWXJvuau0R4JfrLjfdyU2xCWYxdU2Zi6OZGKJsW1107gtttui5quRcOCXykVDywCntRaf6KUqmpL8B+qo7X4D6W1ZunSpbzwwr8oLi7Cn9iFpu7DDRsGanjw+9w4i1bjKM/H6XRwy803M2HCBBmhE0G2b9/OE3/5M9t37OSCLk3ckFMflqGfT61qvkAyXC19XwA+2RHL7F0xdO7cid/+7vecddZZYTl2uBgyjl8pZQc+Bt7VWn/S8nCpUirrkK6eslDWYDSlFKNHj2b48OHMnDmT1994E+umGXjScvB0+wHaHh0ti5MK+LGX5RNTsgZ8HsaPH89tt91GaqpcBxFpevfuzUsvT+X111/nww8/oKDayX1nVhs+8ieYyhstvLAxke01VsaPH8+9994bNa38tghZ8KvmMU+vAZu11s8c8tRMYDLwdMvtjFDVEEnsdjvXXnstF110EW+//TafTJ+O88AOmjqfjafTWdF7ArjlxG1M0UporOacIUO47957I3rmQgEOh4O77rqL4cOH85c/P86fvlPcklPHeV06/kJG35XbeSU/EWWP4fHHH2Xs2LFGlxR2oRzOOQZYAqyneTgnwGPAcuAjIBvYDUzUWu8/0b46clfP8ezZs4eXXnqJb775BpzxNHY5B19a35D3/4ezq8dSV07M3jwstSV0757NPffc3eHGQQuorKzkiSf+wurVa7igSxM35dZjD8G591B39QQ0fLw9hs92xZKb05c/Pf5nunSJ7uVYDT25e7qiMfhbrV27ludfeIHCLVvQcWk0dh2KP6lryI4XjuBXTTU4i77Dvn8HiUnJ3HH7bVx22WVy4rYD8/v9vP7667z77rv0TfJz/1nVpDiDmx2hDP56r+LFTQmsq7Qzfvx4HnjgARyO6F96VObqiVCDBg3i5ZdeYv78+Uyd+gplW75sPgHcbRiBuMie0/tIytuEo3gNjop8HDY71918M9dffz1xcXFGlyZOk9Vq5c477yQnJ4en//oUf/zOyoNnVdE7MfL7/fc1WHh2fTIVTTZ++csHueKKK4wuyXAS/BHAYrFw4YUXct555zFjxgzefOttrJtm4E3rg7vrELQz3ugST8zvw1G6EVfpevB7GT9+PLfeemvEL0YhTt35559PdnY2v3n01zy1WjGlfw3DMz1Gl3VcG/fb+OfGJBwxCTz796c4++yzjS4pIshVMhHE4XAwceJEPnj/PW644QZia3aTsOFjnHvywBeBv1xaY6soJGHjxziLvmPU8KG88cYbPPzwwxL6Uax11E9Ov/48vyGBz3dH5lKPS/Y5+dvaJDK7ZPPiy1Ml9A8hwR+BEhISmDJlCu+++y4XX3QhjtINJG74GHt5AejImEvdUldGXP4sYnYsoV/P7jz33HM89dRTHX7WQtE2KSkpPPPss5x//vl8sDWOdwpjCURI+GsNn+6I4ZXN8Qw+5we88K8XycrKMrqsiCJdPREsMzOT3/zmN1xzzTX84x//ZOPGb3CW59PQYzSBuHRDalLeRpx7VmCv3EZKahp3P/YYF154oUyxYEJOp5M//OEPZGZm8tFHH1HnsTBlQB02A98KAQ3vFcby1d4YLrroIh555JEOOblaqEnwdwD9+vXj+ef/yfz583n+hX+hNn+GJ/NM3F3PCd8U0Fpjq9xG7N4VWAJeJt10U3N3lIkuehFHs1gs3HPPPaSkpPDyyy/jDijuPbMWhwFLUwQ0vJ4fx+J9LiZOnMg999wjQ4ePQ5ppHYRSinHjxvHO229xxeWX4yjdQMKmGVjqK0J/cJ+bmK3ziNmxmH59e/Hqq69yxx13SOiLgyZNmsSDDz7I6goHL2xMwBfmHkl9SOhPnjxZQv8kJPg7mPj4eH75y1/y97//nbQ4B3H5s7GX5ROqs2uWunISNs/EWVvEPffcwwvPP0+vXr1CcizRsV199dU89NBDrK5w8Orm+LD1+WsN72+NZfE+F7fccgu33nqrhP5JSPB3UIMHD+a1115l2NAhuHYtxbl7WdDD33ZgF3EFn5ORGMPzzz/PT37yE6xWWV5SHN9VV13FnXfeydJSJ9N3xITlmPOLnMzZE8OPf/xjbr311rAcs6OT4O/AkpOT+T9PP81PfvITHGWbce38OmijfmyV24jZNp9+OTm8+sor9O/fPyj7FdHvhhtu4JJLLmHGzlhWV4T2HNS2ahvTtsYzYsRw7r//fmnpt5EEfwdnsVi4++67mTx5MvaKQhxFq097n9baEmJ2LGHg2QN59tlnSEpKCkKlwiyUUjz00EP07duHV/MTqfeGJoy9AXhpcyIZGZn89re/k5Flp0B+UlFAKcWtt97K+PHjce5bi+1A+xetUZ4G4rYvpEuXLJ566kk5gSvaxel08utfP0q9V4Wsy2fObhelDYqHf/UIiYmJITlGtJLgjyI///nP6d2nDzF7loHf1659OPeuxBrw8uQTTxAfH+FTRYiIlpOTw/jLL+e/RTFUuYPb6nf7YdaeOM4991yGDj1qDjJxEhL8UcTpdPLgAw+Aux5H6cZTfr2loRJ75VYmTpwgI3dEUEyYMIGAhm9Lg7vC2qpyB41emDhxYlD3axYS/FFm4MCBDBk6FGdFwSmP8rGX5WN3OLjhhhtCVJ0wmx49epDTty+rKoIb/N9VOEhPS2XQoEFB3a9ZSPBHofGXXQbuOqy1JW1/UcCP88BOzh87loSEhNAVJ0wnt18/9jUGd3TPvkY7Obn95IRuO8lPLQqNHDkSi8WCtaaoza+x1FegfW7GjBkTwsqEGWVlZVHj1niCOHX//iYrnTp1Ct4OTUaCPwrFxsaSm5uLra60za9p3Vb+dBbB5vV6AYI6eZvdCj5f+wYwCAn+qJWbm4utqarN/fyWxgOkpqWTnJwc0rqE+VRXVxNrV1iCOLAn1haguro6eDs0GQn+KNWzZ0+0143yNbVpe2tTNb1kLn0RAoWFW+gW5w3qPrvGeCjcUhDUfZqJBH+U6ty5MwDKXdum7a3eerp0kcUqRHC53W62FhbSKyG4wd870UdJaRn79+8P6n7NQoI/SrWe+LJ46k++ccCH9jSSmZkZ4qqE2axfvx63x8uZKcEN/jNTm/eXl5cX1P2ahQR/lGpd81Z5G0+6bes26enGrOolotfy5cuxW6B/kIM/O95PkrN5/+LUyQpcUSoxMRGL1YryNhz2eCA29ahtW4M/NfXo54Q4Hcu+XcoZyR6cQZ7N26JgYEoTeSuW4/P5sNkkyk6FtPijlMViISkx6agWvzt7JO7skYdv2/LhkJKSErb6RPQrKytjz94izk4Lbmu/1dlpXmrr6tmyZUtI9h/NJPijWFpaGpZT6Opp7R4SIhg2bdoEQG5SaII/N9l32HFE20nwR7H09DSsvjYEv6cBpZTMuy+CauvWrVhVc398KKQ6AyS7mo8jTo0EfxTr1KkTFk/dSbezeOpIS0uXflIRVAcOHCDBqYJ6xe6Rkh0BqqqqQneAKCXBH8WysrLQXjf43CfczuKuI0vG8Isga2howGUN7YrrToufhoaGk28oDiPBH8Wys7MBsDSd4NJ2rbG5q+nZo0eYqhJmkZqaSrUntGvg1nhtMhqtHST4o1jrYirWhsrjbqO8DWhvkyy8IoKuU6dONHp10FffauX2Q0Wjklk620GCP4p17tyZhMQkrHXlx93GWlcGQP/+/cNVljCJ1iURV1c4QrL/jfvteAMwbNiwkOw/mhkS/EqpS5RSBUqprUqpR42owQyUUgwaeDb2+tLjztJprS3B4XDSt2/fMFcnol2vXr3o2iWLxSUxp7oYXJss2eciPi5WphJvh7AHv1LKCrwAXAoMACYppQaEuw6zGDJkCDTVHneyNkftPgYNHoTdHtwVkoRQSnHd9ZPYVm1lw/7gvr9211r5rsLBtRMmynu3HYxo8Q8Htmqtt2utPcAHwFUG1GEKI0aMAMBWtfuo51RTNTRWMWrkyKOeEyIYLr30UjIz0vlgezy+wIm3zY73kR1/8sVVtIb3tsYTFxvDhAkTglSpuRgR/F2BPYfc39vy2GGUUlOUUiuVUivLy4/fRy1OrEuXLvTo2RP7MYLfdqD5sVGjRoW7LGESdrudBx58iD21Fj7fHXPCbW/KbeCm3JMPzVy8z8mmAzZ+dtfdsj50OxkR/Mc6xX9UD6DWeqrWeqjWemhGRkYYyope548di7W25Kh5exxVu+ibk0NWlozhF6EzevRoLrjgAqbvjGVHzenN1lbWaOG9bfEMHHg2l19+eZAqNB8jgn8v0P2Q+92AYgPqMI2xY8cCYDuw8+Bjyl2Hpa6MC84/35iihKk8+OCDpKam8a9NSTS2c6lcXwD+tTERqyOWxx77LRaLDEpsLyN+cnlAjlKql1LKAVwPzDSgDtPo1asXWV26Yj+w6+Bjtqrmr8877zyjyhImkpSUxO//8EfKGy28kR/frlE+H26LZXuNlUd+/ejBFeZE+4Q9+LXWPuA+4EtgM/CR1npjuOswE6UU5489D2ttCfg9ANirdtM9O5vu3buf5NVCBMfAgQO5/Y47WFbmZF6R85Rem1fm4Ms9MVxzzTXSWAkCQ/5W0lp/rrXO1Vr30Vo/aUQNZjNy5EjQAWzVxeD3YK0rZczo0UaXJUxm0qRJjBg+nPe2xrOrtm39/eWNFl4rSKBfv1zuvvvuEFdoDtJJZhJnnnkmTpcLa00x1tpSCATkikcRdhaLhd889hhJScm8uCkJ90lmbPYH4KVNiWBz8cc//knG7AeJBL9J2Gw2Bg4ciL2+FGttCVarlQED5Lo5EX7Jyck89rvfU1yv+Pe22BNu+8UeF4XVVn7xy4fp0qVLmCqMfhL8JjKgf39UwwFstSX07t0bl8tldEnCpIYMGcJVV13F3L0xbKs+9joQJQ0Wpu+M44djxnDhhReGucLoJsFvIjk5OQBY68vJzc01uBphdlOmTCE9PZW3C489yue9rXE4nDE88OCDYa8t2knwm0jr/PxHfi2EEeLi4rjjzp+xo8ZKXvnhM3gWVNlYU+HgxptuJj093aAKo5cEv4kceoWu9JeKSHDhhRfSs0c2n+6MO6zVP2NnLGkpyVxzzTXGFRfFJPhN5NAREWlpaQZWIkQzq9XKT667nr11Fgqqmvv699Vb2LDfztXXXCvnoUJEgt9kfvSjcXTqnCUXbomIMW7cOOLjYlm8r/miriUlLqxWC+PHjze4suh17NPpImr94Q+/N7oEIQ7jdDoZde5ovl00F3+gnlUVLgYNGiRr6YaQtPiFEIYbPXo0dR5YWe6guF5x7rlyVXkoSfALIQx31llnATC7Zc7+s88+28hyop4EvxDCcOnp6XTr2oWdtTYS4uPo06eP0SVFNenjF0JEhBdfepmKigpSUlKw2SSaQkl+ukKIiJCQkCBLKYaJdPUIIYTJSPALIYTJSPALIYTJSPALIYTJSPALIYTJSPALIYTJSPALIYTJKH2spW8ijFKqHNhldB1RJB2oMLoIIY5B3pvB1UNrnXHkgx0i+EVwKaVWaq2HGl2HEEeS92Z4SFePEEKYjAS/EEKYjAS/OU01ugAhjkPem2EgffxCCGEy0uIXQgiTkeAXQgiTkeA3EaXUJUqpAqXUVqXUo0bXI0QrpdTrSqkypdQGo2sxAwl+k1BKWYEXgEuBAcAkpdQAY6sS4qA3gUuMLsIsJPjNYziwVWu9XWvtAT4ArjK4JiEA0FovBvYbXYdZSPCbR1dgzyH397Y8JoQwGQl+81DHeEzG8gphQhL85rEX6H7I/W5AsUG1CCEMJMFvHnlAjlKql1LKAVwPzDS4JiGEAST4TUJr7QPuA74ENgMfaa03GluVEM2UUu8D3wL9lFJ7lVK3G11TNJMpG4QQwmSkxS+EECYjwS+EECYjwS+EECYjwS+EECYjwS+EECYjwS9MRSn1W6XURqXUOqXUGqXUiCDs88pgzXaqlKoLxn6EOBEZzilMQyk1CngGOF9r7VZKpQMOrfVJr2BWStlaroUIdY11Wuv4UB9HmJu0+IWZZAEVWms3gNa6QmtdrJTa2fIhgFJqqFJqYcvXf1JKTVVKfQW8rZRarpQ6s3VnSqmFSqkhSqmfKqWeV0oltezL0vJ8rFJqj1LKrpTqo5Sao5T6Tim1RCl1Rss2vZRS3yql8pRSfwnzz0OYlAS/MJOvgO5KqS1KqX8ppca24TVDgKu01jfQPJX1TwCUUllAF631d60baq2rgbVA636vAL7UWntpXkT8fq31EOBh4F8t2zwHvKi1HgaUnPZ3KEQbSPAL09Ba19Ec5FOAcuBDpdRPT/KymVrrxpavPwImtnz9E+Dfx9j+Q+C6lq+vbzlGPHAu8G+l1BrgZZr/+gAYDbzf8vU7p/L9CNFeNqMLECKctNZ+YCGwUCm1HpgM+Pi+EeQ64iX1h7y2SClVqZQaSHO4/+wYh5gJ/FUplUrzh8x8IA6o0loPPl5Z7ftuhGgfafEL01BK9VNK5Rzy0GBgF7CT5pAGuPYku/kAeARI0lqvP/LJlr8qVtDchTNLa+3XWtcAO5RSE1vqUEqpQS0v+YbmvwwAbjzlb0qIdpDgF2YSD7yllNqklFpH89rDfwIeB55TSi0B/CfZx39oDuqPTrDNh8BNLbetbgRuV0qtBTby/bKXDwD3KqXygKRT+3aEaB8ZzimEECYjLX4hhDAZCX4hhDAZCX4hhDAZCX4hhDAZCX4hhDAZCX4hhDAZCX4hhDCZ/w+eglXa7IVCxQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "#年龄特征可视化分析\n",
    "sn.violinplot(x='Survived',y='Age',data=train)#画出小提琴图分析年龄段的影响\n",
    "for a in data:\n",
    "    a['Age_']=pd.cut(a['Age'].astype(int),4)#将年龄分为四个年龄段\n",
    "    a['age']=label.fit_transform(a['Age_'])#构建新特征age"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAaDElEQVR4nO3df5BV5Z3n8fdHQDBiokJjgMY0STAVCNoOLeqwZhlNhDUumCmRZjaKqw5WhB2ylcoMZCuKTlFlZUwcy2gqJDpgNLQkxoUwiRNCxJTRFbsZRH7IQgYHWihpcELErCjtd/+4h8MNXJoL9Lmnu+/nVdV1z3nu85z+3i7oT59fz1FEYGZmBnBa3gWYmVnX4VAwM7OUQ8HMzFIOBTMzSzkUzMws1TvvAk7FwIEDo66uLu8yzMy6lZaWlj0RUVPqvW4dCnV1dTQ3N+ddhplZtyLp34/1ng8fmZlZyqFgZmYph4KZmaW69TkFs5Px/vvv09rayrvvvpt3KaesX79+1NbW0qdPn7xLsR7CoWBVp7W1lbPOOou6ujok5V3OSYsI9u7dS2trK8OHD8+7HOshMj98JKmXpH+VtDxZP1fSCklbktdzivrOlbRV0mZJE7KuzarTu+++y4ABA7p1IABIYsCAAT1ij8e6jkqcU5gNbCpanwOsjIgRwMpkHUkjgUZgFDAReFhSrwrUZ1WouwfCIT3lc1jXkWkoSKoFvgD8oKh5MrAoWV4EXFfU3hQRByJiG7AVGJtlfWZm9qey3lP4R+BvgQ+K2s6LiF0AyeugpH0osKOoX2vS9ickzZDULKm5ra0tk6Kt+syfP59Ro0Zx4YUXUl9fz0svvXTK21y2bBn33ntvJ1QH/fv375TtmB1PZieaJV0L7I6IFknjyxlSou2oJwBFxAJgAUBDQ8NR74/52mMnVmgHWv7hpk7blnVdL774IsuXL2fNmjX07duXPXv28N5775U19uDBg/TuXfq/0aRJk5g0aVJnlmqWuSz3FMYBkyS9DjQBV0p6HHhT0mCA5HV30r8VGFY0vhbYmWF9ZgDs2rWLgQMH0rdvXwAGDhzIkCFDqKurY8+ePQA0Nzczfvx4AObNm8eMGTO4+uqruemmm7j00kvZsGFDur3x48fT0tLCwoULmTVrFvv27aOuro4PPijsMP/xj39k2LBhvP/++/zud79j4sSJjBkzhiuuuILXXnsNgG3btnH55ZdzySWX8I1vfKOCPw2rdpmFQkTMjYjaiKijcAL51xHxJWAZMD3pNh1YmiwvAxol9ZU0HBgBrM6qPrNDrr76anbs2MEFF1zAHXfcwXPPPXfcMS0tLSxdupQf/ehHNDY2smTJEqAQMDt37mTMmDFp34985CNcdNFF6XZ/9rOfMWHCBPr06cOMGTN48MEHaWlp4b777uOOO+4AYPbs2Xz5y1/m5Zdf5qMf/WgGn9qstDzuaL4X+LykLcDnk3UiYgOwBNgIPAPMjIj2HOqzKtO/f39aWlpYsGABNTU1TJ06lYULF3Y4ZtKkSZxxxhkA3HDDDfz4xz8GYMmSJUyZMuWo/lOnTuXJJ58EoKmpialTp7J//35eeOEFpkyZQn19Pbfffju7du0C4Le//S3Tpk0D4MYbb+ysj2p2XBW5eS0iVgGrkuW9wFXH6DcfmF+JmsyK9erVi/HjxzN+/HhGjx7NokWL6N27d3rI58h7Ac4888x0eejQoQwYMIB169bx5JNP8r3vfe+o7U+aNIm5c+fy1ltv0dLSwpVXXsk777zD2Wefzdq1a0vW5MtNLQ+e+8iq3ubNm9myZUu6vnbtWj72sY9RV1dHS0sLAE899VSH22hsbOSb3/wm+/btY/To0Ue9379/f8aOHcvs2bO59tpr6dWrFx/+8IcZPnx4upcREbzyyisAjBs3jqamJgCeeOKJTvmcZuVwKFjV279/P9OnT2fkyJFceOGFbNy4kXnz5nHXXXcxe/ZsrrjiCnr16vg+yuuvv56mpiZuuOGGY/aZOnUqjz/+OFOnTk3bnnjiCR555BEuuugiRo0axdKlhVNsDzzwAA899BCXXHIJ+/bt65wPalYGRRx1VWe30dDQEEc+ZMeXpNrxbNq0iU9/+tN5l9FpetrnsexJaomIhlLveU/BzMxSDgUzM0s5FMzMLOVQMDOzlEPBzMxSDgUzM0v5cZxmJXTmpc1Q3uXNzzzzDLNnz6a9vZ3bbruNOXPmdGoNZuXwnoJZF9De3s7MmTP5xS9+wcaNG1m8eDEbN27MuyyrQg4Fsy5g9erVfPKTn+TjH/84p59+Oo2NjendzWaV5FAw6wLeeOMNhg07/DiR2tpa3njjjRwrsmrlUDDrAkpNN+NZUi0PDgWzLqC2tpYdOw4/ory1tZUhQ4bkWJFVK4eCWRdwySWXsGXLFrZt28Z7771HU1OTn+9sufAlqWYlVHqG3N69e/Od73yHCRMm0N7ezi233MKoUaMqWoMZZBgKkvoBvwH6Jt/nJxFxl6R5wF8DbUnXr0fEz5Mxc4FbgXbgbyLiX7Kqz6yrueaaa7jmmmvyLsOqXJZ7CgeAKyNiv6Q+wPOSfpG8d39E3FfcWdJIoBEYBQwBfiXpAj+n2cyscjI7pxAF+5PVPslXR0/0mQw0RcSBiNgGbAXGZlWfmZkdLdMTzZJ6SVoL7AZWRMRLyVuzJK2T9Kikc5K2ocCOouGtSduR25whqVlSc1tb25Fvm5nZKcg0FCKiPSLqgVpgrKTPAN8FPgHUA7uAbyXdS12UfdSeRUQsiIiGiGioqanJpG4zs2pVkUtSI+L3wCpgYkS8mYTFB8D3OXyIqBUYVjSsFthZifrMzKwgs1CQVCPp7GT5DOBzwGuSBhd1+yKwPlleBjRK6itpODACWJ1VfWZmdrQsrz4aDCyS1ItC+CyJiOWSfiipnsKhodeB2wEiYoOkJcBG4CAw01ceWV623zO6U7d3/p2vHrfPLbfcwvLlyxk0aBDr168/bn+zLGQWChGxDri4RPuNHYyZD8zPqiazruzmm29m1qxZ3HRTZW+cMyvmaS7MuojPfvaznHvuuXmXYVXOoWBmZimHgpmZpRwKZmaWciiYmVnKU2eblVDOJaSdbdq0aaxatYo9e/ZQW1vL3Xffza233lrxOqy6ORTMuojFixfnXYKZDx+ZmdlhDgUzM0s5FKwqRXT0aI/uo6d8Dus6HApWdfr168fevXu7/S/UiGDv3r3069cv71KsB/GJZqs6tbW1tLa20hMe0tSvXz9qa2vzLsN6EIeCVZ0+ffowfPjwvMsw65J8+MjMzFIOBTMzSzkUzMws5VAwM7NUls9o7idptaRXJG2QdHfSfq6kFZK2JK/nFI2ZK2mrpM2SJmRVm5mZlZblnsIB4MqIuAioByZKugyYA6yMiBHAymQdSSOBRmAUMBF4OHm+s5mZVUhmoRAF+5PVPslXAJOBRUn7IuC6ZHky0BQRByJiG7AVGJtVfWZmdrRMzylI6iVpLbAbWBERLwHnRcQugOR1UNJ9KLCjaHhr0nbkNmdIapbU3BNuPjIz60oyDYWIaI+IeqAWGCvpMx10V6lNlNjmgohoiIiGmpqaTqrUzMygQlcfRcTvgVUUzhW8KWkwQPK6O+nWCgwrGlYL7KxEfWZmVpDl1Uc1ks5Ols8APge8BiwDpifdpgNLk+VlQKOkvpKGAyOA1VnVZ2ZmR8ty7qPBwKLkCqLTgCURsVzSi8ASSbcC24EpABGxQdISYCNwEJgZEe0Z1mdmZkfILBQiYh1wcYn2vcBVxxgzH5ifVU1mZtYx39FsZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmapLJ/RPEzSs5I2SdogaXbSPk/SG5LWJl/XFI2ZK2mrpM2SJmRVm5mZlZblM5oPAl+NiDWSzgJaJK1I3rs/Iu4r7ixpJNAIjAKGAL+SdIGf02xmVjmZ7SlExK6IWJMsvw1sAoZ2MGQy0BQRByJiG7AVGJtVfWZmdrSKnFOQVAdcDLyUNM2StE7So5LOSdqGAjuKhrVSIkQkzZDULKm5ra0ty7LNzKpO5qEgqT/wFPCViPgD8F3gE0A9sAv41qGuJYbHUQ0RCyKiISIaampqsinazKxKZRoKkvpQCIQnIuKnABHxZkS0R8QHwPc5fIioFRhWNLwW2JllfWZm9qeyvPpIwCPApoj4dlH74KJuXwTWJ8vLgEZJfSUNB0YAq7Oqz8zMjpbl1UfjgBuBVyWtTdq+DkyTVE/h0NDrwO0AEbFB0hJgI4Url2b6yiMzs8rKLBQi4nlKnyf4eQdj5gPzs6rJzMw65juazcws5VAwM7OUQ8HMzFIOBTMzS5UVCpJWltNmZmbdW4dXH0nqB3wIGJhMR3HoaqIPU5i0zszMepDjXZJ6O/AVCgHQwuFQ+APwUHZlmZlZHjoMhYh4AHhA0v+IiAcrVJOZmeWkrJvXIuJBSX8O1BWPiYjHMqrLzMxyUFYoSPohhZlN1wKHpp4IwKFgZtaDlDvNRQMwMiKOmsrazMx6jnLvU1gPfDTLQszMLH/l7ikMBDZKWg0cONQYEZMyqcrMzHJRbijMy7IIMzPrGsq9+ui5rAsxM7P8lXv10dscfl7y6UAf4J2I+HBWhXUF2+8Z3WnbOv/OVzttW2ZmWSl3T+Gs4nVJ13H42cpmZtZDnNQsqRHxv4ErO+ojaZikZyVtkrRB0uyk/VxJKyRtSV7PKRozV9JWSZslTTiZ2szM7OSVe/joL4tWT6Nw38Lx7lk4CHw1ItZIOgtokbQCuBlYGRH3SpoDzAH+TtJIoBEYRWGupV9JusDPaTYzq5xyrz76r0XLB4HXgckdDYiIXcCuZPltSZuAocm48Um3RcAq4O+S9qaIOABsk7SVwiGqF8us0czMTlG55xT++6l8E0l1wMXAS8B5SWAQEbskDUq6DQX+T9Gw1qTtyG3NAGYAnH/++adSlpmZHaHch+zUSnpa0m5Jb0p6SlJtmWP7A08BX4mIP3TUtUTbUYeoImJBRDRERENNTU05JZiZWZnKPdH8T8AyCsf6hwI/S9o6JKkPhUB4IiJ+mjS/KWlw8v5gYHfS3goMKxpeC+wssz4zM+sE5YZCTUT8U0QcTL4WAh3+mS5JwCPApoj4dtFby4DpyfJ0YGlRe6OkvpKGAyOA1WXWZ2ZmnaDcE817JH0JWJysTwP2HmfMOOBG4FVJa5O2rwP3Aksk3QpsB6YARMQGSUuAjRROZs/0lUdmZpVVbijcAnwHuJ/Ccf4XgA5PPkfE85Q+TwBw1THGzAfml1mTmZl1snJD4e+B6RHxH1C4AQ24j0JYmJlZD1HuOYULDwUCQES8ReESUzMz60HKDYXTjpiO4lzK38swM7Nuotxf7N8CXpD0EwrnFG7Ax/7NzHqccu9ofkxSM4VJ8AT8ZURszLQyMzOruLIPASUh4CAwM+vBTmrqbDMz65kcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZKrNQkPSopN2S1he1zZP0hqS1ydc1Re/NlbRV0mZJE7Kqy8zMji3LPYWFwMQS7fdHRH3y9XMASSOBRmBUMuZhSb0yrM3MzErILBQi4jfAW2V2nww0RcSBiNgGbAXGZlWbmZmVlsc5hVmS1iWHlw49zW0osKOoT2vSdhRJMyQ1S2pua2vLulYzs6pS6VD4LvAJoB7YReGJblB4cM+RotQGImJBRDRERENNTU0mRZqZVauKhkJEvBkR7RHxAfB9Dh8iagWGFXWtBXZWsjYzM6twKEgaXLT6ReDQlUnLgEZJfSUNB0YAqytZm5mZncDjOE+UpMXAeGCgpFbgLmC8pHoKh4ZeB24HiIgNkpZQeNznQWBmRLRnVZuZmZWWWShExLQSzY900H8+MD+reszM7Ph8R7OZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlsps6mzL3/Z7Rnfats6/89VO25aZdV3eUzAzs5RDwczMUpmFgqRHJe2WtL6o7VxJKyRtSV7PKXpvrqStkjZLmpBVXWZmdmxZ7iksBCYe0TYHWBkRI4CVyTqSRgKNwKhkzMOSemVYm5mZlZBZKETEb4C3jmieDCxKlhcB1xW1N0XEgYjYBmwFxmZVm5mZlVbpcwrnRcQugOR1UNI+FNhR1K81aTuKpBmSmiU1t7W1ZVqsmVm16SqXpKpEW5TqGBELgAUADQ0NJft0Z2O+9linbevpszptU2ZWJSq9p/CmpMEAyevupL0VGFbUrxbYWeHazMyqXqVDYRkwPVmeDiwtam+U1FfScGAEsLrCtZmZVb3MDh9JWgyMBwZKagXuAu4Flki6FdgOTAGIiA2SlgAbgYPAzIhoz6o2MzMrLbNQiIhpx3jrqmP0nw/Mz6oeMzM7Pt/RbGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmqcyevNYRSa8DbwPtwMGIaJB0LvAkUAe8DtwQEf+RR31mZtUqzz2Fv4iI+ohoSNbnACsjYgSwMlk3M7MK6kqHjyYDi5LlRcB1+ZViZlad8gqFAH4pqUXSjKTtvIjYBZC8Dio1UNIMSc2Smtva2ipUrplZdcjlnAIwLiJ2ShoErJD0WrkDI2IBsACgoaEhsirQzKwa5bKnEBE7k9fdwNPAWOBNSYMBktfdedRmZlbNKh4Kks6UdNahZeBqYD2wDJiedJsOLK10bWZm1S6Pw0fnAU9LOvT9fxQRz0h6GVgi6VZgOzAlh9rMzKpaxUMhIv4NuKhE+17gqkrXY2Zmh3WlS1LNzCxnDgUzM0s5FMzMLOVQMDOzlEPBzMxSDgUzM0s5FMzMLOVQMDOzlEPBzMxSDgUzM0s5FMzMLOVQMDOzVF4P2bEeaszXHuu0bbX8w02dtq1q0Jk/e/DPv1o5FKzL2n7P6E7b1vl3vtpp2zLryRwKZkW8p2PVzucUzMws5T0FM7NO0FP2MrtcKEiaCDwA9AJ+EBH35lySmVVAT/ml2t11qVCQ1At4CPg80Aq8LGlZRGzMtzKzE9fdT5R39/rt5HS1cwpjga0R8W8R8R7QBEzOuSYzs6qhiMi7hpSk64GJEXFbsn4jcGlEzCrqMwOYkax+CticYUkDgT0Zbj9rrj9frj8/3bl2yL7+j0VETak3utThI0Al2v4ktSJiAbCgIsVIzRHRUInvlQXXny/Xn5/uXDvkW39XO3zUCgwrWq8FduZUi5lZ1elqofAyMELScEmnA43AspxrMjOrGl3q8FFEHJQ0C/gXCpekPhoRG3IsqSKHqTLk+vPl+vPTnWuHHOvvUieazcwsX13t8JGZmeXIoWBmZimHwjFImihps6StkubkXc+JkPSopN2S1uddy4mSNEzSs5I2SdogaXbeNZ0ISf0krZb0SlL/3XnXdDIk9ZL0r5KW513LiZL0uqRXJa2V1Jx3PSdK0v9M/u2sl7RYUr9Kfn+HQglF0238F2AkME3SyHyrOiELgYl5F3GSDgJfjYhPA5cBM7vZz/4AcGVEXATUAxMlXZZvSSdlNrAp7yJOwV9ERH13u1dB0lDgb4CGiPgMhQtuGitZg0OhtG493UZE/AZ4K+86TkZE7IqINcny2xR+MQ3Nt6ryRcH+ZLVP8tWtruaQVAt8AfhB3rVUqd7AGZJ6Ax+iwvdqORRKGwrsKFpvpRv9YuopJNUBFwMv5VzKCUkOvawFdgMrIqJb1Q/8I/C3wAc513GyAvilpJZkWpxuIyLeAO4DtgO7gH0R8ctK1uBQKO24021YtiT1B54CvhIRf8i7nhMREe0RUU/hjvyxkj6Tc0llk3QtsDsiWvKu5RSMi4g/o3D4d6akz+ZdULkknUPhqMRwYAhwpqQvVbIGh0Jpnm4jR5L6UAiEJyLip3nXc7Ii4vfAKrrX+Z1xwCRJr1M4bHqlpMfzLenERMTO5HU38DSFw8HdxeeAbRHRFhHvAz8F/rySBTgUSvN0GzmRJOARYFNEfDvvek6UpBpJZyfLZ1D4T/5arkWdgIiYGxG1EVFH4d/9ryOion+pngpJZ0o669AycDXQna7C2w5cJulDyf+Fq6jwCX+HQgkRcRA4NN3GJmBJztNtnBBJi4EXgU9JapV0a941nYBxwI0U/kJdm3xdk3dRJ2Aw8KykdRT+uFgREd3uss5u7DzgeUmvAKuBf46IZ3KuqWzJ+aefAGuAVyn8jq7olBee5sLMzFLeUzAzs5RDwczMUg4FMzNLORTMzCzlUDAzs5RDwawMkv5XMnPluuQy2Usl/eDQZH2S9h9j3GWSXkrGbJI0r6KFm52gLvU4TrOuSNLlwLXAn0XEAUkDgdMj4rYyhi8CboiIV5LZdz+VZa1mp8p7CmbHNxjYExEHACJiT0TslLRKUjo1s6RvSVojaaWkmqR5EIWJzQ7NibQx6TtP0g8l/VrSFkl/XeHPZFaSQ8Hs+H4JDJP0fyU9LOk/l+hzJrAmmYjtOeCupP1+YLOkpyXdfsQDUy6kMEX15cCdkoZk+BnMyuJQMDuO5PkIY4AZQBvwpKSbj+j2AfBksvw48J+SsfcADRSC5a+A4ikXlkbE/4uIPcCzdK+J26yH8jkFszJERDuFGU9XSXoVmH68IUVjfwd8V9L3gTZJA47sc4x1s4rznoLZcUj6lKQRRU31wL8f0e004Ppk+a+A55OxX0hmuwQYAbQDv0/WJyfPdB4AjKcwgZ5ZrrynYHZ8/YEHkymxDwJbKRxK+klRn3eAUZJagH3A1KT9RuB+SX9Mxv63iGhPcmI18M/A+cDfH3oOgFmePEuqWQ6S+xX2R8R9eddiVsyHj8zMLOU9BTMzS3lPwczMUg4FMzNLORTMzCzlUDAzs5RDwczMUv8fIm+jzU6D9DcAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "#兄弟姐妹配偶情况可视化分析\n",
    "sn.countplot(x='SibSp',hue=\"Survived\",data=train)#画出随行情况对生存的影响\n",
    "for a in data:\n",
    "    a['SibSp_']=train['SibSp'].map(lambda x:'s'if x<1 else 'm'if x<3 else 'l')\n",
    "    #将随行人员个数分为三类，没有\n",
    "    a['sibsp']=label.fit_transform(a['SibSp_'])#构建新随行特征sibsp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "#票价特征构建\n",
    "for a in data:\n",
    "    a['Fare_']=pd.cut(a['Fare'].astype(int),4)#将船票分为四个票价段\n",
    "    a['fare']=label.fit_transform(a['Fare_'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "#船票特征\n",
    "for a in data:\n",
    "    a['Cabin_']=a['Cabin'].map(lambda x:'yes'if type(x)==str else'no')#分为有编号无编号\n",
    "    a['cabin']=label.fit_transform(a['Cabin_'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 选取特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "target=['Survived']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_code1=['sex','sibsp','fare','Parch','Pclass','embarked','cabin']#选取新建立的sex等特征\n",
    "columns=target+data_code1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 得到训练集与测试集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 224,
   "metadata": {},
   "outputs": [],
   "source": [
    "#对于训练数据\n",
    "x_train,x_test,y_train,y_test=model_selection.train_test_split(train[data_code1],\n",
    "                                                              train[target],\n",
    "                                                              random_state=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 225,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(668, 7)"
      ]
     },
     "execution_count": 225,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 226,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(668, 1)"
      ]
     },
     "execution_count": 226,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "     sex  sibsp  fare  Parch  Pclass  embarked  cabin\n",
      "105    1      2     0      0       3         2      0\n",
      "68     0      0     0      2       3         2      0\n",
      "253    1      1     0      0       3         2      0\n",
      "320    1      2     0      0       3         2      0\n",
      "706    0      2     0      0       2         2      0\n",
      "..   ...    ...   ...    ...     ...       ...    ...\n",
      "835    0      1     0      1       1         0      1\n",
      "192    0      1     0      0       3         2      0\n",
      "629    1      2     0      0       3         1      0\n",
      "559    0      1     0      0       3         2      0\n",
      "684    1      1     0      1       2         2      0\n",
      "\n",
      "[668 rows x 7 columns]\n"
     ]
    }
   ],
   "source": [
    "print(x_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 随机森林算法实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.ensemble import RandomForestClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [],
   "source": [
    "rf=RandomForestClassifier(max_features='aut,\n",
    "                          random_state=1,\n",
    "                          n_jobs=-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [],
   "source": [
    "pram_grids={'criterion':['gini','entropy'],\n",
    "            \"min_samples_leaf\":[1,5,10,15],\n",
    "            'min_samples_split':[1,2,6,8,12,20],\n",
    "            'n_estimators':[50,100,400],\n",
    "           }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs=GridSearchCV(estimator=rf,param_grid=pram_grids,scoring='accuracy',cv=3,n_jobs=-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 对特征进行训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\Program Files (x86)\\python\\lib\\site-packages\\sklearn\\model_selection\\_search.py:765: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "  self.best_estimator_.fit(X, y, **fit_params)\n"
     ]
    }
   ],
   "source": [
    "gs=gs.fit(x_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8009130206439624\n"
     ]
    }
   ],
   "source": [
    "print(gs.best_score_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'criterion': 'gini', 'min_samples_leaf': 5, 'min_samples_split': 20, 'n_estimators': 400}\n"
     ]
    }
   ],
   "source": [
    "print(gs.best_params_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 特征优化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [],
   "source": [
    "rf2=RandomForestClassifier(criterion='gini', \n",
    "                           min_samples_leaf= 15,\n",
    "                           min_samples_split= 2, \n",
    "                           n_estimators=100,\n",
    "                          random_state=1,\n",
    "                          n_jobs=-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-117-29c66dd7728d>:1: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "  gs2=rf2.fit(x_train,y_train)\n"
     ]
    }
   ],
   "source": [
    "rf2.fit(x_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>variable</th>\n",
       "      <th>importance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>sex</td>\n",
       "      <td>0.555072</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Pclass</td>\n",
       "      <td>0.194064</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>cabin</td>\n",
       "      <td>0.126811</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>embarked</td>\n",
       "      <td>0.042267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Parch</td>\n",
       "      <td>0.041952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>sibsp</td>\n",
       "      <td>0.037067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>fare</td>\n",
       "      <td>0.002768</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   variable  importance\n",
       "0       sex    0.555072\n",
       "4    Pclass    0.194064\n",
       "6     cabin    0.126811\n",
       "5  embarked    0.042267\n",
       "3     Parch    0.041952\n",
       "1     sibsp    0.037067\n",
       "2      fare    0.002768"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#特征重要性排\n",
    "pd.concat((pd.DataFrame(x_train.iloc[:,:].columns,columns=['variable']),\n",
    "          pd.DataFrame(rf2.feature_importances_,columns=['importance'])),\n",
    "         axis=1).sort_values(by=\"importance\",ascending=False)\n",
    "#发现最重要的特征是性别，船票费用是最不重要的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 在测试集上进行测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 227,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_predict=rf2.predict(test[data_code1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 190,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_survived=pd.DataFrame(test_predict,columns=['survived'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived\n",
       "0         0\n",
       "1         0\n",
       "2         0\n",
       "3         0\n",
       "4         0"
      ]
     },
     "execution_count": 197,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_survived.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 213,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>892</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>893</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>894</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>895</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>896</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  survived\n",
       "0          892         0         0\n",
       "1          893         1         0\n",
       "2          894         0         0\n",
       "3          895         0         0\n",
       "4          896         1         0"
      ]
     },
     "execution_count": 213,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#读取所给结果\n",
    "submission=pd.read_csv(\"gender_submission.csv\")\n",
    "submission['survived']=test_survived\n",
    "submission.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 220,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9284009546539379\n"
     ]
    }
   ],
   "source": [
    "#精度计算\n",
    "a=submission['survived']\n",
    "b=submission['Survived']\n",
    "c=0\n",
    "for i in range(0,418):\n",
    "    if a[i]==b[i]:\n",
    "        c+=1\n",
    "accuracy=c/419#利用吻合数据数量比上总数量得到精确度\n",
    "print(accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 与所给的数据进行对比得到，准确度高达百分之92.8，所以总体来说模型较为合理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 六、新得体会"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 体会一：Python中的机器学习库是真的方便，虽然上课讲的神经网络、线性回归的基础原理我并不是很懂，但是只要弄懂应用算法所涉及到的参数，输入数据的格式要求，再实践中还是非常方便的。非常适合我们这些编程经验不是很足的新手，最后得到结果十分自豪并且不敢相信自己能完成。\n",
    "### 体会二：在处理实际决策、预测问题中，最难的数据的分析和处理与选择而不是机器学习算法的应用。对于缺失数据如何填充，那些数据是无关数据应该删去。怎么处理已有数据，如何构建有用的特征。还有怎样快速成批的处理数据，都值得我们深思。这是对于后面预测结果的影响是最大的，只有特征选对了再加上适合的算法才能得到最准的的预测结果。\n",
    "### 体会三：jupyter编辑器的优点有很多，最大的就是他的Markdown功能，简直是将ward与编辑器合二为一，写出来的报告思路十分清晰。而且他能完成各种格式输入，包括公式以及特殊字符，图片以及超链接，我已经对此有了基础的了解。同时对于编写代码的速度也有所提升，尤其是一些快捷键的应用，可以实现不用鼠标，更加快速的输入。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
